Functional Copy-Number Alterations as Diagnostic and Prognostic Biomarkers in Neuroendocrine Tumors

Functional copy-number alterations (fCNAs) are DNA copy-number changes with concordant differential gene expression. These are less likely to be bystander genetic lesions and could serve as robust and reproducible tumor biomarkers. To identify candidate fCNAs in neuroendocrine tumors (NETs), we integrated chromosomal microarray (CMA) and RNA-seq differential gene-expression data from 31 pancreatic (pNETs) and 33 small-bowel neuroendocrine tumors (sbNETs). Tumors were resected from 47 early-disease-progression (<24 months) and 17 late-disease-progression (>24 months) patients. Candidate fCNAs that accurately differentiated these groups in this discovery cohort were then replicated using fluorescence in situ hybridization (FISH) on formalin-fixed, paraffin-embedded (FFPE) tissues in a larger validation cohort of 60 pNETs and 82 sbNETs (52 early- and 65 late-disease-progression samples). Logistic regression analysis revealed the predictive ability of these biomarkers, as well as the assay-performance metrics of sensitivity, specificity, and area under the curve. Our results indicate that copy-number changes at chromosomal loci 4p16.3, 7q31.2, 9p21.3, 17q12, 18q21.2, and 19q12 may be used as diagnostic and prognostic NET biomarkers. This involves a rapid, cost-effective approach to determine the primary tumor site for patients with metastatic liver NETs and to guide risk-stratified therapeutic decisions.


Introduction
Neuroendocrine tumors (NETs) are epithelial neuroendocrine neoplasms with high metastatic potential.Up to 60% of patients have advanced metastatic liver disease at the time of diagnosis [1,2].Primary NETs in the abdomen are regularly missed due to small tumor size, body habitus, and limitations of abdominal imaging [1].Twelve to fifty percent of primary gastroenteropancreatic NETs (GEP NETs) go undetected by imaging [2].
GEP NETs that originate in the pancreas (pNETs) and small bowel (sbNETs) comprise half of all GEP NETs [3].The two tumor sites cause similar signs and symptoms, including abdominal pain, nausea, poor appetite, weight loss, and diarrhea [3].Distinguishing between these two primary sites is important for guiding proper clinical management.
Immunohistochemistry is routinely used to infer site of origin in metastatic NETs of occult origin [14][15][16].The most commonly employed markers include CDX2 (midgut) [17], polyclonal PAX8 (pancreas) [18], TTF-1 (lung) [19], and SATB2 (rectum) [20].None of these are NET-specific, though, and all were adapted for this purpose from more widespread use as markers of adenocarcinoma site of origin [21].Because of the infrequency of NETs relative to other malignant epithelial tumors, complementary (serotonin for midgut) or best-in-class (islet 1 and PAX6 for pancreas and OTP for lung) NET-specific markers are not on the test menus of most clinical laboratories [15,[22][23][24].Many laboratories have switched from polyclonal to better-performing (for adenocarcinoma applications) monoclonal PAX8 antibodies, the latter of which are non-reactive in pancreatic NETs (i.e., polyclonal PAX8positivity in pNETs is due to cross-reactivity with PAX6).Even with access to all the best-performing markers at the University of Iowa, up to 5% of NETs still defy site-of-origin assignment after exhaustive immunohistochemical and radiologic evaluation.Against this backdrop, the lead authors of the 2022 World Health Organization NET Classification stated that there remains a critical need for additional biomarkers to better classify NETs [25].
Copy-number alterations (CNAs) are changes in chromosomal content that result in gain or loss of.copies of DNA segments, while functional copy-number alterations (fCNAs) are changes in chromosomal content that also impact expression of genes within the region.Somatic fCNAs are predicted to be cancer drivers as opposed to bystander lesions [26].CNAs can be detected through techniques such as whole-genome sequencing (WGS), optical genome mapping, chromosomal microarray platforms (CMA) or fluorescence in situ hybridization (FISH), while fCNAs can be detected through pairing those techniques with gene-expression analysis.Over the last decade, CNA-profile studies of NETs have yielded promising results, but notable discrepancies between studies have been observed [27][28][29][30][31][32][33][34][35][36][37].Robust analysis of CNA profile differences between pNETs and sbNETs is still needed.
We present here the results of a pipeline used to identify differences in fCNA profiles between pNETs and sbNETs, as well as between early-and late-disease-progression tumors, with the goal of developing a panel of clinically applicable FISH probes for use in diagnostic and prognostic NET biomarker testing.This type of testing has direct relevance to metastatic NETs of unknown origin detected in the liver.To identify diagnostic NET biomarkers, a discovery cohort of patient-matched normal primary metastatic tumors underwent CMA and RNA-sequencing to identify fCNAs in 64 patients diagnosed with GEP NET (31 pNETs and 33 sbNETs).We further analyzed the prognostic value of these fCNAs in NETs, specifically with regard to time to progression.We replicated fCNAs using FISH on patientspecimen-derived tissue microarrays (TMAs) and selected probes based on their location within fCNAs of interest.Based on these collective findings, we propose a testing algorithm for primary site (pNET vs sbNET) and disease progression (late disease progression [LDP] or early disease progression [EDP]) that can be rapidly and inexpensively performed on formalin-fixed, paraffin-embedded (FFPE) NET tissue.Overall, these findings support fCNAs' utility as effective diagnostic and prognostic biomarkers in NETs.

Differences in Copy-Number Alteration between pNETs and sbNETs
To establish primary and metastatic CNA profiles in pNETs and sbNETs, normal primary metastatic tumors underwent CMA profiling.In total, 47 pNET (28 primary, 19 metastatic) and 63 sbNET (32 primary, 31 metastatic) specimens were assessed by CMA.No significant CNA differences were observed between primary and metastatic tumors for either NET type (Supplementary Figure S1 and S2).
inexpensively performed on formalin-fixed, paraffin-embedded (FFPE) NET tissue.Overall, these findings support fCNAs' utility as effective diagnostic and prognostic biomarkers in NETs.

Differences in Copy-Number Alteration between pNETs and sbNETs
To establish primary and metastatic CNA profiles in pNETs and sbNETs, normal primary metastatic tumors underwent CMA profiling.In total, 47 pNET (28 primary, 19 metastatic) and 63 sbNET (32 primary, 31 metastatic) specimens were assessed by CMA.No significant CNA differences were observed between primary and metastatic tumors for either NET type (Supplementary Figure S1 and S2).
Considerable evidence suggests there are unique CNAs in pNETs and sbNETs, but discrepancies between studies have been observed [27][28][29][30][31][32].To determine the comprehensive CNA profiles of pNETs and sbNETs, normal-tissue-primary-tumor pairs underwent CMA profiling.A total of 49 pNET and 63 sbNET patients were tested.We observed more copy-number-gain events in chromosomes 4, 5, 7, 9, 12, 13, 14, and 17-20 in pNETs (Fisher exact test p < 0.05).Copy-number-loss events in chromosomes 9 and 18 were detected almost exclusively in sbNETs (Fisher's exact test p < 0.05).These observations indicate robust and reproducible CNA-profile differences between pancreatic and small-bowel NETs (Figure 1 and Supplementary Table S1).Functional copy-number alterations (fCNAs) are somatic changes in chromosomal content that affect the expression of genes within the region.To establish fCNA profiles in pNETs and sbNETs, we performed patient-matched RNA-sequencing of 31 pNETs (Supplementary Table S2) and 33 sbNETs (Supplementary Table S3) that underwent CMA profiling.To analyze the prognostic value of fCNAs in NETs, we separated our samples based on time to progression: late disease progression, or LDP, (>24 months until progression, n = 17) and early disease progression, or EDP, (<24 months until progression, n = 46).CNAs that were present in at least 15% of the cohort with significant concordant geneexpression changes (increased expression in a chromosomal gain and decreased expression in chromosomal loss) were identified as potential fCNA biomarkers (Table 1).fCNA profiles were compared, and a panel of differentiating fCNAs was selected (Table 2).Functional copy-number alterations (fCNAs) are somatic changes in chromosomal content that affect the expression of genes within the region.To establish fCNA profiles in pNETs and sbNETs, we performed patient-matched RNA-sequencing of 31 pNETs (Supplementary Table S2) and 33 sbNETs (Supplementary Table S3) that underwent CMA profiling.To analyze the prognostic value of fCNAs in NETs, we separated our samples based on time to progression: late disease progression, or LDP, (>24 months until progression, n = 17) and early disease progression, or EDP, (<24 months until progression, n = 46).CNAs that were present in at least 15% of the cohort with significant concordant gene-expression changes (increased expression in a chromosomal gain and decreased expression in chromosomal loss) were identified as potential fCNA biomarkers (Table 1).fCNA profiles were compared, and a panel of differentiating fCNAs was selected (Table 2).The results showed that gains in 5q31, 7q31.1, 9p22.1, 17q11.2,18q21.1, and 19q13.2indicated a pancreatic primary tumor site, while a gain in 4p16.3 and a loss in 9p22.1 and/or 18q21.1 indicated a small-bowel primary site.Notable loss of chromosome 18 in sbNETs is well established in the literature [27,29,30,[33][34][35][36].Potential prognostic NET biomarkers included gains in 7q31.1, 9p22.1, and 19q13.2.

FISH Validation of fCNA Differences between pNETs and sbNETs 2.2.1. Diagnosis
FISH is a cytogenetic technique that can be performed on formalin-fixed, paraffinembedded (FFPE) tissue to identify gene-and locus-specific CNAs.To replicate the evidence of the diagnostic potential of our selected fCNAs and respective probes, FISH analyses were performed on 300 nuclei of pNET (n = 60, Supplementary Table S4) and sbNET (n = 82, Supplementary Table S5) FFPE specimens.Our results indicate that CCNE1 copy-number gain is the strongest indicator of the primary site, with observed 5x higher rates in pNET samples (44% vs. 9%, X 2 = 31.87,p < 0.00001) (Table 3).Additional copy-number gains in SMAD4 (25% vs. 5%, X 2 = 15.69,p = 0.00008), ERBB2 (22% vs. 6%, X 2 = 10.63,p = 0.001), and CDKN2A (22% vs. 7%, X 2 = 9.07, p = 0.003) were observed 3-5× more frequently in pNETs tumors (Figure 2).Copy-number loss in SMAD4 was reported in 60% of sbNET tumors, which is consistent with the literature [27,29,30,[33][34][35][36].Normal copy-number status in CCNE1 was significantly more frequent in sbNET tumors (64% vs. 40%, X 2 = 10.99,p = 0.0009).There were no significant CNA differences between pNET and sbNET tumors reported in CKS1B, FGFR3, CSF1R, and MET.Thresholding copy-number loss on FISH of FFPE sections is challenging, as signal is assessed in a 2D section of a 3D structure (i.e., the nucleus), resulting in the incorrect impression of loss in some subsets of normal nuclei.We assessed the background apparent deletion rate due to sectioning and found an average background deletion rate of 28.3%.We used this important value to distinguish between loss due to artifact and true copy-number loss (Supplementary Table S6

Prognosis
Next, we assessed whether these same fCNAs could be used as prognostic NET biomarkers.To determine if our fCNAs findings could predict disease progression, we separated our specimens into late-disease-progression (LDP) and early-disease-progression (EDP) samples with a cutoff of 5.5 years until reported disease progression.FISH analyses were performed on 300 nuclei of pNET (LDP = 24, EDP = 17) and sbNET (LDP = 41, EDP = 36) FFPE specimens.Copy-number gain in CDKN2A was 2.2× more frequent in LDP pNET tumors (42% vs. 19%, X 2 = 12.48, p = 0.0004) (Table 4).SMAD4 copy-number gain was 2.9× more frequent in EDP pNET tumors (42% vs. 10%, X 2 = 11.50, p = 0.0007).There were no prognostic implications for sbNET samples using these probes in our data set (Supplementary Table S7).These results validate the use of CNAs in CDKN2A and SMAD4 as independent potential prognostic pNET biomarkers.
We created a clinical decision tree (DT) using CNAs as diagnostic biomarkers (Figure 4).Our DT suggests assessing copy-number status for ERBB2, CCNE1, CDKN2A, and MET is effective for differentiating between pNETs and sbNETs.If ERBB2 is gained, the tumor has a 90% chance of being from the pancreas; however, if ERBB2 is lost or normal CCNE1 copynumber status should be assessed.If CCNE1 is gained, then the tumor has a greater than 80% chance of being from the pancreas but if it is lost or normal, CDKN2A copy-number status should be assessed.If CDKN2A is gained or normal, the tumor has a slightly greater than 60% chance of being from the pancreas and a 40% chance of being from the small bowel.If CDKN2A is lost and MET is either lost or normal, the sample has a greater than 80% chance of being from the small bowel; however, if MET is gained and ERBB2 is normal, it has a greater than 85% chance of being from the pancreas.Lastly, if MET is gained but ERBB2 is lost, the tumor has a greater than 90% chance of being from the small bowel.SMAD4 was not differential in our DT, which may be due to CCNE1 and SMAD4 copy number being highly correlated in our dataset (0.635) (Supplementary Table S11).Overall, When assessing the prognostic impact of these probes, gain of CCNE1 was associated with early disease progression (EDP) in pNETs (gain, 0.46-0.89,CI = 0.24-0.99,p = 0.039) and sbNETs (normal, 0.58-0.91,CI = 0.41-0.99,p = 0.043) (Supplementary Table S9).In addition, FGFR3 gain was associated with EDP in sbNETs (0.70-0.95,CI = 0.52-0.99,p= 0.0004) (Supplementary Table S10).CSF1R loss was suggestive of LDP in sbNETs (0.57, CI = 0.40-0.73,p = 0.07).Unlike our single-probe analysis, CDKN2A and SMAD4 had no impact on pNET prognosis in the multivariate analysis.
We created a clinical decision tree (DT) using CNAs as diagnostic biomarkers (Figure 4).Our DT suggests assessing copy-number status for ERBB2, CCNE1, CDKN2A, and MET is effective for differentiating between pNETs and sbNETs.If ERBB2 is gained, the tumor has a 90% chance of being from the pancreas; however, if ERBB2 is lost or normal CCNE1 copy-number status should be assessed.If CCNE1 is gained, then the tumor has a greater than 80% chance of being from the pancreas but if it is lost or normal, CDKN2A copynumber status should be assessed.If CDKN2A is gained or normal, the tumor has a slightly greater than 60% chance of being from the pancreas and a 40% chance of being from the small bowel.If CDKN2A is lost and MET is either lost or normal, the sample has a greater than 80% chance of being from the small bowel; however, if MET is gained and ERBB2 is normal, it has a greater than 85% chance of being from the pancreas.Lastly, if MET is gained but ERBB2 is lost, the tumor has a greater than 90% chance of being from the small bowel.SMAD4 was not differential in our DT, which may be due to CCNE1 and SMAD4 copy number being highly correlated in our dataset (0.635) (Supplementary Table S11).Overall, these findings support the use of CNAs in MET, CDKN2A, ERBB2, and CCNE1 as simple diagnostic NET biomarkers.

Discussion
The aim of this paper was to determine if CNAs could be used as NET biomarkers by integrating patient-matched CNA and gene-expression data to identify fCNAs.We replicated fCNAs by selecting probes based on their location within fCNAs of interest using FISH on patient-specimen-derived tissue microarrays with the goal of developing a fast clinical DT that could be applied in any testing laboratory to expedite medical deci-

Discussion
The aim of this paper was to determine if CNAs could be used as NET biomarkers by integrating patient-matched CNA and gene-expression data to identify fCNAs.We replicated fCNAs by selecting probes based on their location within fCNAs of interest using FISH on patient-specimen-derived tissue microarrays with the goal of developing a fast clinical DT that could be applied in any testing laboratory to expedite medical decision-making.
FISH is a widely available methodology that offers a cost-effective approach with relatively rapid turnaround time.Through the identification, replication, and prioritization of FISH probes in this study, there is potential to improve the diagnostic assessment of NETs while reducing the cost and the processing time of results, thus improving patient care.
Our findings support the use of CNAs in MET, ERBB2, SMAD4, and CCNE1 as diagnostic NET biomarkers and resulted in an AUC of 0.902.This compares favorably to a previously published immunochemistry-based algorithm that resulted in an ROC of 0.864 for distinguishing between tumors of pancreatic versus small-bowel or pulmonary origin [38].As previously mentioned, IHC is currently used to infer site of origin in metastatic NETs of occult origin [14][15][16].Taking the evidence together, we recommend assessing the copy-number status of MET, CDKN2A, ERBB2, and CCNE1 to differentiate between pNETs and sbNETs.Although SMAD4 was statistically associated with primary site, its copy-number status was highly correlated with that of CCNE1 (0.635).The current leading NET prognostic biomarkers include tumor grade, tumor stage, and IHC analysis of genes such as p53 and ATRX/DAXX [15,23,25].Our findings suggest that FGFR3, CDKN2A, SMAD4, and CCNE1 are valuable prognostic NET biomarkers.Many studies of other tumor types show that CCNE1 gain is a key marker of poor prognosis that may dictate more aggressive clinical management [39][40][41][42][43][44][45].
Limitations of our study include the moderate sample size of the dataset.This reflects the rarity of NETs, a factor that makes it challenging to gather larger cohorts, and this is especially limiting for our time-to-progression analysis.For our CMA discovery cohort, the average PFS was 2.4 years, which is different from the PFS of our FISH validation cohort and that in the literature, which is approximately 5.5 years [27][28][29][30]33].Another limitation includes the missing values of the dataset.To combat this, we used the median score for each probe to impute the missing data; however, observed FISH results for all variables and all samples would provide more accurate overall results.A general limitation is that data gathered from clinical samples and electronic health records are not always complete across all samples.
To our surprise, there were no significant CNA differences in FGFR3 and CSF1R between pNETs and sbNETs using FISH and logistic regression analyses even though we observed significant CNA differences in our CMA-data discovery cohort.Secondly, CKS1B copy-number gain was not associated with PFS in this study but has been correlated with poor PFS in pan-cancer analyses [46][47][48][49][50][51][52].It is important to note that those studies did not include assessments of the correlation between GEP NET prognosis and CKS1B copynumber status.Lastly, we determined there was no significant difference in CDKN2A copynumber loss between pNETs and sbNETs, although our CMA data suggested otherwise.CDKN2A has been shown to be a tumor suppressor in pNETs, and focal CDKN2A loss has been observed [28].Additionally, CDKN2A loss might not have been observed due to the inherent challenges of assessing loss (as opposed to gain) in FFPE samples.
In summary, we found significant CNA differences in CDKN2A, ERBB2, SMAD4, and CCNE1 between pNET and sbNET tumor samples after combined analyses of chromosomal microarrays, RNA sequencing, and fluorescence in situ hybridization data.Using the FISH data, we performed logistic regression analysis and derived performance metrics while developing a clinical decision tree to help determine the primary tumor site and guide risk-stratified therapeutic decisions for metastatic tumors of unknown origin detected in the liver.This combinatorial approach to biomarker identification has proven highly effective and may represent a powerful way to define clinically relevant biomarkers for additional NET primary sites in the future.

Patient Cohort
All patients in this single-institution study were enrolled under an Institutional Review Board-approved protocol.The CMA discovery cohort consisted of patient-matched normal primary metastatic tumors of 47 pNETs (28 primary, 19 metastatic) and 63 sbNETs (32 primary, 31 metastatic) (Supplementary Figure S3).Normal primary metastatic tumors include the primary and metastatic tumor specimens as well as normal, healthy adjacent tissue derived from the same patient.The FISH replication cohort consisted of patientmatched normal primary metastatic tumors of 84 pNETs (76 primary, 3 metastatic liver, 5 metastatic lymph node) and 98 sbNETs (86 primary, 7 metastatic liver, 5 metastatic lymph node).Patient-cohort demographic data are listed in Supplementary Table S12.Fresh tissue samples were collected and placed in RNAlater solution (Thermo Fisher Scientific, Waltham, MA, USA), and nucleic acids were isolated using the RNeasy Plus Universal Mini Kit (Qiagen, Valencia, CA, USA) or DNeasy blood & Tissue Kit (Qiagen, Valencia, CA, USA) per the protocols recommended by the manufacturers.

Microarray Protocols
Microarray experiments were performed using the NimbleGen Human CGH 720 K Whole-Genome Tiling version 3.0 array (Roche NimbleGen; Madison, WI, USA) and the Affymetrix CytoScan High-Definition (HD) array (Affymetrix array, Santa Clara, CA, USA) according to the manufacturers' instructions.Calculation of log2 ratio values and quality-control metrics were assessed using the NimbleScan software tool (version 2.5; Roche NimbleGen) or the ChAS (Chromosome Analysis Software) tool (version 1.1.2;Affymetrix) (CytoScan).CNA calling and data interpretation were performed using the Nexus Copy Number software (version 6.1, BioDiscovery; El Segundo, CA, USA) and the rank segmentation algorithm (for Nimblegen arrays) or the SNPRank segmentation algorithm (CytoScanHD) supplied with the Nexus software suite (version 2.5; Roche NimbleGen) [53].Allele-specific copy-number analysis of tumors (ASCAT) was performed to identify ploidy and percent normal tissue.

RNA Processing
RNA-seq was performed within the Genomics Division of the University of Iowa Institute of Human Genetics (University of Iowa, Iowa City, IA, USA) using the Illumina TruSeq protocol (Illumina, Inc., San Diego, CA, USA), as previously described [54].RNAseq count data were normalized using FPKM.To identify differentially expressed genes, TopHat (v 2.1.0,John Hopkins University, Baltimore, MA, USA), Cuffquant, Cuffnorm, and Cuffduff were performed, comparing tumors to healthy adjacent pancreatic or small-bowel tissue.Statistically significant expression change was determined by assessing the false discovery rate (FDR) adjusted p-value (q-value), with significance defined as q < 0.05.

fCNA Identification
The R package (v 3.4.4)iGC [55] integrates sample-paired copy-number and geneexpression analysis to identify concordant differential gene expression.Log2 copy-number values from NimbleGen (Roche NimbleGen; Madison, WI, USA) or Affymetrix (Affymetrix array, Santa Clara, CA, USA) were utilized to analyze the association between copy number and mRNA levels of 31 pNET and 33 sbNET samples.CNAs that were observed in at least 20% of samples were assessed.Allelic-imbalance and loss-of-heterozygosity lesions were removed.Genes for which expression was predicted to be driven by either copy-number gain (and increased expression) or copy-number loss (and decreased expression) events and which met previously defined p and FDR thresholds were retained.No fold-change threshold was utilized for the iGC analysis.

Fluorescence In Situ Hybridization (FISH)
TMAs were assembled from formalin-fixed, paraffin-embedded lesions of 60 pNET and 82 sbNET primary and metastatic specimens arrayed in triplicate (2-micron sections).FISH studies were performed using Empire Genomics FISH probes CKS1B-20-OR, FGFR3-20-OR, CSFR1-20-GR, MET-20-GR, CDKN2A-20-GR, ERBB2-20-OR, SMAD4-20-GR, and CCNE1-20-OR (Empire Genomics, Inc., New York, NY, USA) per the protocol recommended by the manufacturer (empiregenomics.com/resources/protocols-procedures).One hundred nuclei per section and three sections per patient specimen, for a total of three hundred nuclei, were scored.Briefly, slides were baked for a minimum of 8 h at 45 • C, deparaffinized with Hemo-De, and then rehydrated before they were heated in Dako pre-treatment 20×.The samples were digested with pepsin at 37 • C for 5 min.After digestion, slides were washed with Dako buffer and dehydrated with a series of 70%, 85%, and 100% ethanol.The slides were then probed, denatured, and hybridized for 48 h at 37 • C. Unbound probe was washed from the slides, and they were counterstained with DAPI.FISH slides were analyzed at 100× magnification using CytoVision (Leica Microsystems, Wetzlar, Germany) filters.
Counts for loss-, normal-, and gain-signal-pattern nuclei were average for each specimen and probe set.Statistical differences were assessed using the Chi-Square test with significance defined as p < 0.05.Firth's bias-reduced logistic regression was used to estimate the predicted probability of association of CNAs with a given tissue of origin.Copy-number loss was defined as occurrence observed in a proportion less than or equal to 28.3% of nuclei showing fewer than two copies to account for "pseudo-loss" due to the plane of sectioning.Copy-number gain was defined as occurrence observed in a proportion greater than or equal to 15.3% of nuclei.These values were calculated by assessing the average percent copy-number loss, normal copy-number status, and copy-number gain within our entire dataset and selecting the greatest-variable-range value for pNETs (loss) and sbNETs (gain), respectively.Duplicate CNAs of the same probe set with the largest variable ranges were chosen.Samples outside of these loss and gain thresholds were determined to be normal.Descriptive statistics for copy-number loss, normal copy-number status, and copy-number gain are listed in the table provided (Supplementary Table S13).

Logistic Regression Analysis and DT Development
Logistic regression and statistical analyses were performed by assessing the 8 biomarkerlocus variables in the FISH dataset.To do so, each biomarker in each sample was given a score.The score was determined by calculating the percent normal nuclei, subtracting the percent loss nuclei, and adding the percent gain nuclei.Therefore, a sample probed with a biomarker that resulted in 60% normal, 28% loss, and 12% gain nuclei would be given a score of 44 (60 − 28 + 12).Loss, normal, and gain values were determined by thresholds set based on the original variables.Missing data were imputed with the median score for each probe.The biomarker and response variables were tested within a pipeline with the Analysis of Overdispersed Data (aod-version 1.3.2) and ggplot2 (version 3.4.4)R (v 4.3.1)packages.
DT dataset curation was performed by assessing the 8 biomarker-locus variables in the FISH dataset categorically.For example, if a probe for any given sample was lost, a value of 1 was designated, while normal and gains were assigned values of 2 and 3, respectively.Missing data points were assigned a value of 0. The decision tree was developed using the rpart (version 4.  Informed Consent Statement: Informed written consent was provided by all patients in accordance with a protocol approved by the University of Iowa Institutional Review Board (IRB Number 199911057), and studies were conducted in accordance with the Belmont Report.

Figure 1 .
Figure 1.CMA data comparing pNET and sbNET tumor samples.The top row indicates chromosomes 1-22.Below is the frequency of the CNA events.Significance indicated by blue significance line.Genes, exons, CNAs, microRNAS (miRNAs), and cancer-related genes are listed below the significance bar.Blue represents copy-number gain.Red represents copy-number loss.

Figure 1 .
Figure 1.CMA data comparing pNET and sbNET tumor samples.The top row indicates chromosomes 1-22.Below is the frequency of the CNA events.Significance indicated by blue significance line.Genes, exons, CNAs, microRNAS (miRNAs), and cancer-related genes are listed below the significance bar.Blue represents copy-number gain.Red represents copy-number loss.

Figure 4 .
Figure 4. Clinical DT models using CNAs in ERBB2, CCNE1, CDKN2A, and MET as NET biomarkers.The shading of the boxes represents the most likely primary tumor site, while the numbers on the right-hand side represent the predicted probabilities by site.

Figure 4 .
Figure 4. Clinical DT models using CNAs in ERBB2, CCNE1, CDKN2A, and MET as NET biomarkers.The shading of the boxes represents the most likely primary tumor site, while the numbers on the right-hand side represent the predicted probabilities by site.

Funding:
This research was funded by an NCI NET SPORE grant to B.D., D.Q., J.H. and T.B. (P50 CA174521), an Interdisciplinary Genetics T32 Predoctoral Training Grant H.V. (GM 008629), an NCI R01 grant to D.Q., B.D., P.B. and A.B. (CA260200), and by the Stead Family Department of Pediatrics departmental funds.Institutional Review Board Statement:The study was conducted in accordance with the Declaration of Helsinki.Patients presenting to the University of Iowa with pNETs and sbNETs were consented for genetic studies and entered into a tumor registry approved by the University of Iowa Institutional Review Board (IRB#: 199911057).

Table 1 .
fCNA loci and examples of genes within those loci that exhibited copy-number alterations and concordant gene-expression changes.

Table 2 .
Differential panel of CNAs and associated commercially available and clinically used cancer-related gene probes.

Table 3 .
FISH results assessing the average percentage of copy-number loss, normal copy-number status, and gain of biomarkers between pNETs and sbNETs.X 2 = Chi-square.

Table 4 .
FISH results assessing the average percentage of copy-number loss, normal copy-number status, and gain of biomarkers between LDP and EDP.

Table 5 .
Logistic regression results based on primary tumor site.(a) Reported scores of statistical analyses per probe.(b) Quality control metrics for the overall model with reported AUC.
CI = confidence interval.AUC = area under the ROC curve.